How to Find Good Finite-Length Codes: 
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Abstract — We explain how to optimize finite-length LDPC 
codes for transmission over the binary erasure channel. Our 
approach relies on an analytic approximation of the erasure 
probability. This is in turn based on a finite-length scaling result 
to model large scale erasures and a union bound involving 
minimal stopping sets to take into account small error events. We 
show that the performances of optimized ensembles as observed 
in simulations are well described by our approximation. Although 
we only address the case of transmission over the binary erasure 
channel, our method should be applicable to a more general 
setting. 



I. Introduction 

In this paper, we consider transmission using random el- 
ements from the standard ensemble of low-density parity- 
check (LDPC) codes defined by the degree distribution pair 
(A,p). For an introduction to LDPC codes and the standard 
notation see [1]. In [2], one of the authors (AM) suggested that 
the probability of error of iterative coding systems follows a 
scaling law. In [3]-[5], it was shown that this is indeed true for 
LDPC codes, assuming that transmission takes place over the 
BEC. Strictly speaking, scaling laws describe the asymptotic 
behavior of the error probability close to the threshold for 
increasing blocklengths. However, as observed empirically 
in the papers mentioned above, scaling laws provide good 
approximations to the error probability also away from the 
threshold and already for modest blocklengths. This is the 
starting point for our finite-length optimization. 

In [3], [5] the form of the scaling law for transmission over 
the BEC was derived and it was shown how to compute the 
scaling parameters by solving a system of ordinary differential 
equations. This system was called covariance evolution in 
analogy to density evolution. Density evolution concerns the 
evolution of the average number of erasures still contained in 
the graph during the decoding process, whereas covariance 
evolution concerns the evolution of its variance. Whereas 
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Luby et al. [6] found an explicit solution to the density evolu- 
tion equations, to date no such solution is known for the system 
of covariance equations. Covariance evolution must therefore 
be integrated numerically. Unfortunately the dimension of the 
ODE's system ranges from hundreds to thousand for typical 
examples. As a consequence, numerical integration can be 
quite time consuming. This is a serious problem if we want 
to use scaling laws in the context of optimization, where the 
computation of scaling parameters must be repeated for many 
different ensembles during the optimization process. 

In this paper, we make two main contributions. First, we 
derive explicit analytic expressions for the scaling parameters 
as a function of the degree distribution pair and quantities 
which appear in density evolution. Second, we provide an 
accurate approximation to the erasure probability stemming 
from small stopping sets and resulting in the erasure floor. 

The paper is organized as follows. Section |H] describes 
our approximation for the error probability, the scaling law 
being discussed in Section III-BI and the error floor in Sec- 
tion III-CI We combine these results and give in Section III-DI 
an approximation to the erasure probability curve, denoted 
by P(n, A, p, e), that can be computed efficiently for any 
blocklength, degree distribution pair, and channel parameter. 
The basic ideas behind the explicit determination of the 
scaling parameters (together with the resulting expressions) 
are collected in Section [Hi] Finally, the most technical (and 
tricky) part of this computation is deferred to Section II VI 

As a motivation for some of the rather technical points to 
come, we start in Section II-AI by showing how P(n, A, p, e) 
can be used to perform an efficient finite-length optimization. 

A. Optimization 

The optimization procedure takes as input a blocklength 
n, the BEC erasure probability e, and a target probability of 
erasure, call it P ta i g et- Both bit or block probability can be 
considered. We want to find a degree distribution pair (A, p) of 
maximum rate so that P(n, A, p, e) < P ta i-get> where P(n, A, p, e) 
is the approximation discussed in the introduction. 

Let us describe an efficient procedure to accomplish this 
optimization locally (however, many equivalent approaches are 
possible). Although providing a global optimization scheme 
goes beyond the scope of this paper, the local procedure was 
found empirically to converge often to the global optimum. 

It is well known [1] that the design rate r(X,p) associated 
to a degree distribution pair (A, p) is equal to 



r(X,p) =1 
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For "most" ensembles the actual rate of a randomly chosen 
element of the ensemble LDPC(n, A, p) is close to this design 
rate [7]. In any case, r(A, p) is always a lower bound. Assume 
we change the degree distribution pair slightly by AX(x) = 
J2i AA^" 1 and Ap(x) = J2i A PiX l ~ l , where AA(1) = = 
Ap(l) and assume that the change is sufficiently small so that 
A + AA as well as p + Ap are still valid degree distributions 
(non-negative coefficients). A quick calculation then shows 
that the design rate changes by 



r(A + AA, p + Ap) - r(A, p) 

(1-r) ^ A Pl 
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In the same way, the erasure probability changes (according 
to the approximation) by 

P(n, A + AA, p + Ap, e) - P(n, A, p, e) 

l i 

Equations (|2j and Q give rise to a simple linear program to 
optimize locally the degree distribution: Start with some initial 
degree distribution pair (A, p). If P(n, A, p, e) < Ptarget> then 
increase the rate by a repeated application of the following 
linear program. 

LP 1: [Linear program to increase the rate] 

max{(l -r)J2 A ^/* ~ E A */* ' 

t i 

AA, = 0; - mm{6, \ t } < A\ t < 6; 



A Pt = 0; 



mm{5, p^ < Api < <5; 



dP 



Apt < Ptarget - P(n, A, p, e)}. 



i dpl 

Hereby, S is a sufficiently small non-negative number to ensure 
that the degree distribution pair changes only slightly at each 
step so that changes of the rate and of the probability of 
erasure are accurately described by the linear approximation. 
The value 6 is best adapted dynamically to ensure convergence. 
One can start with a large value and decrease it the closer 
we get to the final answer. The objective function in LP ^ 
is equal to the total derivative of the rate as a function of the 
change of the degree distribution. Several rounds of this linear 
program will gradually improve the rate of the code ensemble, 
while keeping the erasure probability below the target (last 
inequality). 

Sometimes it is necessary to initialize the optimization 
procedure with degree distribution pairs that do not fulfill 
the target erasure probability constraint. This is for instance 
the case if the optimization is repeated for a large number 
of "randomly" chosen initial conditions. In this way, we can 
check whether the procedure always converges to the same 
point (thus suggesting that a global optimum was found), or 
otherwise, pick the best outcome of many trials. To this end we 
define a linear program that decreases the erasure probability. 



LP 2: [Linear program to decrease P(n, A, p, e)] 

x ap 



iin{ 2^ — AA,: 
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min{(5, A,} < AA, < 5; 



Example 1: [Sample Optimization] Let us show a sample 
optimization. Assume we transmit over a BEC with channel 
erasure probability e = 0.5. We are interested in a block 
length of n = 5000 bits and the maximum variable and check 
degree we allow are l max = 13 and r max = 10, respectively. 
We constrain the block erasure probability to be smaller than 
Ptarget = 10 -4 . We further count only erasures larger or equal 
to Smin = 6 bits. This corresponds to looking at an expurgated 
ensemble, i.e., we are looking at the subset of codes of the 
ensemble that do not contain stopping sets of sizes smaller than 
6. Alternatively, we can interpret this constraint in the sense 
that we use an outer code which "cleans up" the remaining 
small erasures. Using the techniques discussed in Section 
III-CI we can compute the probability that a randomly chosen 
element of an ensemble does not contain stopping sets of size 
smaller than 6. If this probability is not too small then we 
have a good chance of finding such a code in the ensemble by 
sampling a sufficient number of random elements. This can be 
checked at the end of the optimization procedure. 

We start with an arbitrary degree distribution pair: 

\{x) = 0.139976x + 0.149265a; 2 + 0.174615a; 3 (4) 
+ 0.110137a; 4 + 0.0184844a; 5 + 0.0775212a; 6 
+ 0.0166585a; 7 + 0.00832646a; 8 + 0.0760256x 9 
+ 0.0838369a; 10 + 0.0833654a; 11 + 0.0617885a; 12 , 
p{x) =0.0532687a; + 0.0749403a; 2 + 0.11504a; 3 (5) 
+ 0.0511266a; 4 + 0.170892a; 5 + 0.17678a; 6 
+ 0.0444454a; 7 + 0.152618a; 8 + 0.160889a; 9 . 

This pair was generated randomly by choosing each coefficient 
uniformly in [0, 1] and then normalizing so that A(l) = p(l) = 
1. The approximation of the block erasure probability curve of 
this code (as given in Section lTl-D> is shown in Fig.^ F° r this 

Pb 




Fig. 1: Approximation of the block erasure probability for the 
initial ensemble with degree distribution pair given in @ and 

<E}. 

initial degree distribution pair we have r(A, p) = 0.2029 and 
P B (n = 5000, X,p,e= 0.5) = 0.000552 > P target . Therefore, 
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we start by reducing Ps(n = 5000, A, p,e = 0.5) (over the 
choice of A and p) using LP [2] until it becomes lower than 
Ptarget- After a number of LP [2] rounds we obtain the degree 
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Fig. 2: Approximation of the block erasure probability for the 
ensemble obtained after the first part of the optimization (see 
and Q). The erasure probability has been lowered below 
the target. 

distribution pair: 

\{x) =0.111913:r + 0.178291a: 2 + 0.203641a: 3 (6) 
+ 0.139163a: 4 + 0.0475105a 5 + 0.106547a 6 
+ 0.0240221a 7 + 0.0469994a 9 + 0.0548108a 10 
+ 0.0543393a 11 + 0.0327624a 12 , 

p{x) =0.0242426a + 0.101914a 2 + 0.142014a 3 (7) 
+ 0.0781005a 4 + 0.198892a 5 + 0.177806a 6 
+ 0.0174716a 7 + 0.125644a 8 + 0.133916a 9 . 

For this degree distribution pair we have Ps(n = 
5000, A, p,e = 0.5) = 0.0000997 < P tai - get and r(X,p) = 
0.218. We show the corresponding approximation in Fig. [2] 

Now, we start the second phase of the optimization and opti- 
mize the rate while insuring that the block erasure probability 
remains below the target, using LP \l\ The resulting degree 
distribution pair is: 

A(a) =0.0739196a + 0.657891a 2 + 0.268189a 12 , (8) 
p{x) =0.390753a 4 + 0.361589a 5 + 0.247658a 9 , (9) 

where r(A, p) = 0.41065. The block erasure probability plot 
for the result of the optimization is shown in Fig 
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Fig. 3: Error probability curve for the result of the op- 
timization (see © and 0). The solid curve is Ps(n = 
5000, A, p, e = 0.5) while the small dots correspond to simu- 
lation points. In dotted are the results with a more aggressive 
expurgation. 



Each LP step takes on the order of seconds on a standard 
PC. In total, the optimization for a given set of parameters 
(n, e, P tal -get, lmax, r max , s min ) takes on the order of minutes. 

Recall that the whole optimization procedure was based 
on Ps(n, A, p, e) which is only an approximation of the true 
block erasure probability. In principle, the actual performances 
of the optimized ensemble could be worse (or better) than 
predicted by Ps(n, A, p, e). To validate the procedure we 
computed the block erasure probability for the optimized 
degree distribution also by means of simulations and compare 
the two. The simulation results are shown in Fig [3] (dots 
with 95% confidence intervals) : analytical (approximate) and 
numerical results are in almost perfect agreement! 

How hard is it to find a code without stopping sets of 
size smaller than 6 within the ensemble LDPC(5000, A, p) 
with (A, p) given by Eqs. (|8jl and (|9jl? As discussed in more 
detail in Section III-CI in the limit of large blocklengths the 
number of small stopping sets has a joint Poisson distribution. 
As a consequence, if Aj denotes the expected number of 
minimal stopping sets of size i in a random element from 
LDPC(5000, A, p), the probability that it contains no stopping 
set of size smaller than 6 is approximately cxp{— Y^i=i ^0- 
For the optimized ensemble we get cxp{ — (0.2073+0.04688+ 
0.01676 + 0.007874 + 0.0043335)} « 0.753, a quite large 
probability. We repeated the optimization procedure with var- 
ious different random initial conditions and always ended up 
with essentially the same degree distribution. Therefore, we 
can be quite confident that the result of our local optimization 
is close to the global optimal degree distribution pair for the 
given constraints (n, e, P tal - ge t, lmax, r max , s mm ). 

There are many ways of improving the result. E.g., if we 
allow higher degrees or apply a more aggressive expurgation, 
we can obtain degree distribution pairs with higher rate. E.g., 
for the choice l max = 15 and Smin = 18 the resulting degree 
distribution pair is 

A(a) =0.205031a + 0.455716a 2 , (10) 

+ 0.193248a 13 + 0.146004a 14 
p{x) =0.608291a 5 + 0.391709a 6 , (11) 

where r(A, p) = 0.433942. The corresponding curve is de- 
picted in Fig |3 as a dotted line. However, this time the 
probability that a random element from LDPC(5000, A, p) 
has no stopping set of size smaller than 18 is approximately 
6.10 -6 . It will therefore be harder to find a code that fulfills 
the expurgation requirement. 

It is worth stressing that our results could be improved further 
by applying the same approach to more powerful ensembles, 
e.g., multi-edge type ensembles, or ensembles defined by 
protographs. The steps to be accomplished are: (i) derive the 
scaling laws and define scaling parameters for such ensembles; 
(ii) find efficiently computable expressions for the scaling 
parameters; (Hi) optimize the ensemble with respect to its 
defining parameters (e.g. the degree distribution) as above. 
Each of these steps is a manageable task - albeit not a trivial 
one. 

Another generalization of our approach which is slated for 
future work is the extension to general binary memoryless 
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symmetric channels. Empirical evidence suggests that scaling 
laws should also hold in this case, see [2], [3]. How to prove 
this fact or how to compute the required parameters, however, 
is an open issue. 

In the rest of this paper, we describe in detail the approxi- 
mation P(n, A, p, e) for the BEC. 

II. Approximation P B (n, A, p,e) and P b (n, A, p, e) 

In order to derive approximations for the erasure probability 
we separate the contributions to this erasure probability into 
two parts - the contributions due to large erasure events and 
the ones due to small erasure events. The large erasure events 
give rise to the so-called waterfall curve, whereas the small 
erasure events are responsible for the erasure floor. 

In Section Hl-BI we recall that the water fall curve follows 
a scaling law and we discuss how to compute the scaling 
parameters. We denote this approximation of the water fall 
curve by P^/ b (n, A, p, e). We next show in Section Hl-CI how 
to approximate the erasure floor. We call this approximation 
Pg/ b Smh (n, \, p, e). Hereby, s m ; n denotes the expurgation pa- 
rameter, i.e, we only count error events involving at least s m ; n 
erasures. Finally, we collect in Section III-DI our results and 
give an approximation to the total erasure probability. We start 
in Section Hi- Al with a short review of density evolution. 

A. Density Evolution 

The initial analysis of the performance of LDPC codes 
assuming that transmission takes place of the BEC is due to 
Luby, Mitzenmacher, Shokrollahi, Spielman and Stemann, see 
[6], and it is based on the so-called peeling algorithm. In this 
algorithm we "peel-off" one variable node at a time (and all 
its adjacent check nodes and edges) creating a sequence of 
residual graphs. Decoding is successful if and only if the final 
residual graph is the empty graph. A variable node can be 
peeled off if it is connected to at least one check node which 
has residual degree one. Initially, we start with the complete 
Tanner graph representing the code and in the first step we 
delete all variable nodes from the graph which have been 
received (have not been erased), all connected check nodes, 
and all connected edges. 

From the description of the algorithm it should be clear that 
the number of degree-one check nodes plays a crucial role. 
The algorithm stops if and only if no degree-one check node 
remains in the residual graph. Luby et al. were able to give 
analytic expressions for the expected number of degree-one 
check nodes as a function of the size of the residual graph 
in the limit of large blocklengths. They further showed that 
most instances of the graph and the channel follow closely this 
ensemble average. More precisely, let r\ denote the fraction of 
degree-one check nodes in the decoder. (This means that the 
actual number of degree-one check nodes is equal to n(l — 
r)ri, where n is the blocklength and r is the design rate of 
the code.) Then, as shown in [6], r% is given parametrically 
by 

r 1 (y) = eX(y)[y-l + p(l-eX(y))}. (12) 



where y is determined so that eL(y) is the fractional (with 
respect to n) size of the residual graph. Hereby, L(x) = 
LiX 1 = hi tt-t^- is the node perspective variable node 

Jo A(u)Gm 

distribution, i.e. L; is the fraction of variable nodes of degree i 
in the Tanner graph. Analogously, we let Ri denote the fraction 
of degree i check nodes, and set R(x) = '^2 li Rix l . With 
an abuse of notation we shall sometimes denote the irregular 
LDPC ensemble as LDPC(n, L, R). 

The threshold noise parameter e* = e*(A, p) is the supre- 
mum value of e such that ri(y) > for all y £ (0, 1], (and 
therefore iterative decoding is successful with high probabil- 
ity). In Fig. we show the function r\(y) depicted for the 
ensemble with X(x) = x 2 and p(x) = x 5 for e = e* . As 
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Fig. 4: r\(y) for y £ [0,1] at the threshold. The degree 
distribution pair is A(:r) = x 2 and p(y) = x 5 and the threshold 
is e* = 0.4294381. 

the fraction of degree-one check nodes concentrates around 
r\{y), the decoder will fail with high probability only in two 
possible ways. The first relates to y « and corresponds 
to small erasure events. The second one corresponds to the 
value y* such that ri(y*) = 0. In this case the fraction of 
variable nodes that can not be decoded concentrates around 
v* = e*L(y*). 

We call a point y* where the function y — 1 + p(l — eX(y)) 
and its derivative both vanish a critical point. At threshold, i.e. 
for e = e*, there is at least one critical point, but there may be 
more than one. (Notice that the function ri(y) always vanishes 
together with its derivative at y = 0, cf. Fig. |4] However, this 
does not imply that y = is a critical point because of the 
extra factor X(y) in the definition of ri(y).) Note that if an 
ensemble has a single critical point and this point is strictly 
positive, then the number of remaining erasures conditioned 
on decoding failure, concentrates around v* = e*L(y*). 

In the rest of this paper, we will consider ensembles with a 
single critical point and separate the two above contributions. 
We will consider in Section lll-Bl erasures of size at least n-yv* 
with 7 £ (0, 1). In Section IH-CI we will instead focus on 
erasures of size smaller than wyv* . We will finally combine 
the two results in Section Hl-DI 

B. Waterfall Region 

It was proved in [3], that the erasure probability due to large 
failures obeys a well defined scaling law. For our purpose it is 
best to consider a refined scaling law which was conjectured 
in the same paper. For convenience of the reader we restate it 
here. 
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Conjecture 1: [Refined Scaling Law] Consider transmis- 
sion over a BEC of erasure probability e using random 
elements from the ensemble LDPC(n, A, p) = LDPC(n, L, R). 
Assume that the ensemble has a single critical point y* > 
and let v* = e*L(y*), where e* is the threshold erasure 
probability. Let (n, A, p, e) (respectively, (n, A, p, e)) 
denote the expected bit (block) erasure probability due to 
erasures of size at least rvyv*, where 7 € (0, 1). Fix z := 
\^n{e* — (3n~3 — e). Then as n tends to infinity, 



P$ r (n,X,p,e) = Q 



l + C^n- 1 / 3 ) 



P?(n,\,p,e) = v*Q(£j (l + 0(7^/3)), 

where a = a(X, p) and (3 = f3(X, p) are constants which 
depend on the ensemble. 

In [3], [5], a procedure called covariance evolution was defined 
to compute the scaling parameter a through the solution of 
a system of ordinary differential equations. The number of 
equations in the system is equal to the square of the number of 
variable node degrees plus the largest check node degree minus 
one. As an example, for an ensemble with 5 different variable 
node degrees and r max = 30, the number of coupled equations 
in covariance evolution is (5 + 29) 2 = 1156. The computation 
of the scaling parameter can therefore become a challenging 
task. The main result in this paper is to show that it is possible 
to compute the scaling parameter a without explicitly solving 
covariance evolution. This is the crucial ingredient allowing 
for efficient code optimization. 

Lemma 1: [Expression for a] Consider transmission over a 
BEC with erasure probability e using random elements from 
the ensemble LDPC(n, A, p) = LDPC(n, L, R). Assume that 
the ensemble has a single critical point y* > 0, and let 
e* denote the threshold erasure probability. Then the scaling 
parameter a in Conjecture is given by 

'p{x*f - p{x* 2 ) + p'(x*)(l - 2x*p{x*)) - x* 2 p'(x* 2 ) 



2 Hy*) 



*\2 



-*2 



V '(l)A(y*) 2 pf '(x*) 2 
X(y* 2 )-y* 2 e* 2 X>(y* 2 ) 



1/2 



L'(l)X(y*) 2 
where x* = e*X(y*), x* = 1 — x*. 

The derivation of this expression is explained in Section ITTT1 
For completeness and the convenience of the reader, we 
repeat here also an explicit characterization of the shift param- 
eter (3 which appeared already (in a slightly different form) in 
[3], [5]. 

Conjecture 2: [Scaling Parameter /3] Consider transmission 
over a BEC of erasure probability e using random elements 
from the ensemble LDPC(n, A, p) = LDPC(n, L, R). Assume 
that the ensemble has a single critical point y* > 0, and let 
e* denote the threshold erasure probability. Then the scaling 
parameter /3 in Conjecture is given by 

RIO- ( ^ i r' 2 2 {e'X'{v') 2 r' 2 -x'{X"{v')r' 2 +\'(y*)x*)) 2 V /3 
/ J /" — ^ L'(l)^p'(x'y i x* 1 "(2t*\'( y *y i r* 3 ~\"{y*)r*x*) ) ^ L:, > 

where x* = e*X(y*) and x* = 1 — x*, and for i > 2 

m>j>i V / \J / 



Further, f2 is a universal (code independent) constant defined 
in Ref. [3], [5]. 

We also recall that ft is numerically quite close to 1. In the 
rest of this paper, we shall always adopt the approximate ft 
by 1. 

C. Error Floor 

Lemma 2: [Error Floor] Consider transmission over a BEC 
of erasure probability e using random elements from an ensem- 
ble LDPC(n, A, p) = LDPC(n, L, R). Assume that the ensem- 
ble has a single critical point y* > 0. Let v* = e*L(y*), where 
e* is the threshold erasure probability. Let Pjf Snlln (n, A, p, e) 
(respectively Pg (n, A, p, e)) denote the expected bit (block) 
erasure probability due to stopping sets of size between s„„„ 
and rvyv* \ where 7 £ (0, 1). Then, for any e < e* , 



->-E 



An,X,p, e) 



S>S m [ 



sA s e s (l + o(l)) 



(14) 



Pl s Jn,X,p,e) =l-e 



Xa>a 



A s e s 



(l+o(l)), (15) 



where A s = coef {log (A(x)) , x s } for s > 1, with A(x) = 
E s >o A s xS and 



A s =^^coef|n(l + xy' ; ) 
coef {Yl^l + xf -ixy 



i\nLi 



(16) 



(l-r)R, 



,X' 



r 



Discussion: In the lemma we only claim a multiplicative error 
term of the form o(l) since this is easy to prove. This weak 
statement would remain valid if we replaced the expression for 
A s given in (I16> with the explicit and much easier to compute 
asymptotic expression derived in [1]. In practice however the 
approximation is much better than the stated o(l) error term if 
we use the finite-length averages given by dl6> . The hurdle in 
proving stronger error terms is due to the fact that for a given 
length it is not clear how to relate the number of stopping 
sets to the number of minimal stopping sets. However, this 
relationship becomes easy in the limit of large blocklengths. 

Proof: The key in deriving this erasure floor expression 
is in focusing on the number of minimal stopping sets. These 
are stopping set that are not the union of smaller stopping 
sets. The asymptotic distribution of the number of minimal 
stopping sets contained in an LDPC graph was already studied 
in [1]. We recall that the distribution of the number of minimal 
stopping sets tends to a Poisson distribution with independent 
components as the length tends to infinity. Because of this 
independence one can relate the number of minimal stopping 
sets to the number of stopping sets - any combination of 
minimal stopping sets gives rise to a stopping set. In the limit 
of infinity blocklenghts the minimal stopping sets are non- 
overlapping with probability one so that the weight of the 
resulting stopping set is just the sum of the weights of the 
individual stopping sets. For example, the number of stopping 
sets of size two is equal to the number of minimal stopping 
sets of size two plus the number of stopping sets we get by 
taking all pairs of (minimal) stopping sets of size one. 
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Therefore, define A(x) = E s >i A s x s , with A s , the ex- 
pected number of minimal stopping sets of size s in the graph. 
Define further A(x) = J2 s >o A s % s , with A s the expected 
number of stopping sets of size s in the graph (not necessarily 
minimal). We then have 



A(x) = e A(x) = 1 + A(x) + 



A(x) 2 A(x) 3 



21 



3! 



so that conversely A(x) = log (A(x)). 

It remains to determine the number of stopping sets. As 
remarked right after the statement of the lemma, any expres- 
sion which converges in the limit of large blocklength to the 
asymptotic value would satisfy the statement of the lemma 
but we get the best empirical agreement for short lengths if 
we use the exact finite-length averages. These average were 
already compute in [1] and are given as in dl6l . 

Consider now e.g. the bit erasure probability. We first 
compute A(x) using dl6> and then A(x) by means of A(x) = 
log (A(x)). Consider one minimal stopping set of size s. The 
probability that its s associated bits are all erased is equal to 
e s and if this is the case this stopping set causes s erasures. 
Since there are in expectation A s minimal stopping sets of 
size s and minimal stopping sets are non-overlapping with 
increasing probability as the blocklength increases a simple 
union bound is asymptotically tight. The expression for the 
block erasure probability is derived in a similar way. Now we 
are interested in the probability that a particular graph and 
noise realization results in no (small) stopping set. Using the 
fact that the distribution of minimal stopping sets follows a 
Poisson distribution we get equation (I15> . ■ 

D. Complete Approximation 

In Section II1-J3I we have studied the erasure probability 
stemming from failures of size bigger than nr/v* where 
7 e (0,1) and v* = e*L(y*), i.e., v* is the asymptotic 
fractional number of erasures remaining after the decoding at 
the threshold. In Section ITl-CI we have studied the probability 
of erasures resulting from stopping sets of size between s m ; n 
and njis*. Combining the results in the two previous sections, 
we get 

P B (n, A, p, e) =P^(n, A, p, e) + Pf , Smi >, A, p, e) 
y/n{e* — (3n~i — e) \ 



--Q 



(17) 



P 6 (n, A, p, e) =Pf (n, A, p, e) + Pf Smi >, A, p, e) 
u*Q [^£_^1_^ 



(18) 



Here we assume that there is a single critical point. If 
the degree distribution has several critical points (at different 
values of the channel parameter e*, e* 2 ,. . . ) then we simply 
take a sum of terms P^ (n, A, p, e), one for each critical point. 



Let us finally notice that summing the probabilities of 
different error types provides in principle only an upper bound 
on the overall error probability. However, for each given 
channel parameter e, only one of the terms in Eqs. illl . Jl 8I > 
dominates. As a consequence the bound is actually tight. 

III. Analytic Determination of a 

Let us now show how the scaling parameter a can be 
determined analytically. We accomplish this in two steps. We 
first compute the variance of the number of erasure messages. 
Then we show in a second step how to relate this variance to 
the scaling parameter a. 

A. Variance of the Messages 

Consider the ensemble LDPC(n, A, p) and assume that 
transmission takes place over a BEC of parameter e. Perform 
I iterations of BP decoding. Set p\ equal to 1 if the message 
sent out along edge i from variable to check node is an erasure 
and 0, otherwise. Consider the variance of these messages in 
the limit of large blocklengths. More precisely, consider 



lim 

n— >oc 



nL'(l) 

Lemma[3]in Section HVl contains an analytic expression for this 
quantity as a function of the degree distribution pair (A, p), 
the channel parameter e, and the number of iterations £. Let 
us consider this variance as a function of the parameter e and 
the number of iterations I. Figure [5] shows the result of this 
evaluation for the case (L(x) = |x 2 + |x 3 ;i?(x) = ^,a; 2 + 
^x 3 ). The threshold for this example is e* w 0.8495897455. 
















































































































Fig. 5: The variance as a function of e and I = 0, ■ • ■ ,9 for 

(L(x) = ^x 2 + ^x 3 ;R{x) 
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This value is indicated as a vertical line in the figure. As we 
can see from this figure, the variance is a unimodal function 
of the channel parameter. It is zero for the extremal values 
of e (either all messages are known or all are erased) and 
it takes on a maximum value for a parameter of e which 
approaches the critical value e* as £ increases. Further, for 
increasing £ the maximum value of the variance increases. The 
limit of these curves as I tends to infinity V = lim^oo is 
also shown (bold curve): the variance is zero below threshold; 
above threshold it is positive and diverges as the threshold 
is approached. In Section I1VI we state the exact form of the 
limiting curve. We show that for e approaching e* from above 



V 



7 



(1 - e\'(y)p'(x)y 



+ 0((l-e\'(y)p>(x))- 1 ), (19) 



7 



where 

7 = e* 2 X'(y*) 2 {[Pi*! 2 P(x* 2 ) + P'(S*)(1 ~ 2x*p(x*)) 
- x* 2 p\x* 2 )] + e*V (**) 2 [A(y*) 2 - A(y* 2 ) - y* 2 X'(y* 2 )]} . 

Here y* is the unique critical point, x* = e*X(y*), and x* = 
1-x*. Since (l-eX'(y)p' (x)) = Q(y/e - e*), Eq. implies 
a divergence at e*. 

B. Relation Between 7 anc/ a 

Now that we know the asymptotic variance of the edges 
messages, let us discuss how this quantity can be related to 
the scaling parameter a. Think of a decoder operating above 
the threshold of the code. Then, for large blocklengths, it will 
get stuck with high probability before correcting all nodes. In 
Fig [5] we show Ri, the number of degree-one check nodes, 
as a function of the number of erasure messages for a few 
decoding runs. Let V* represent the normalized variance of 

Ri 



\n(x) / 

— \ x* / 
r ~" \. /Si 
























—ran 





















































2000 4000 6000 8000 10000 \ * //■ 

Fig. 6: Number of degree-one check nodes as a function of the 
number of erasure messages in the corresponding BP decoder 



for LDPC(?i = 8192, X(x) 



, p(x) = x ) . The thin lines 



represent the decoding trajectories that stop when n = and 
the thick line is the mean curve predicted by density evolution. 

the number of erased messages in the decoder after an infinite 
number of iterations 



[(Ei^l-EKE,^')] 



v, 



= lim lim ■ 

n— >oo t—>oa 



nU(l) 

In other words, V* is the variance of the point at which the 
decoding trajectories hit the Ri = axis. 

This quantity can be related to the variance of the number 
of degree-one check nodes through the slope of the density 
evolution curve. Normalize all the quantities by nL'(l), the 
number of edges in the graph. Consider the curve r\{e,x) 
given by density evolution, and representing the fraction of 
degree-one check nodes in the residual graph, around the 
critical point for an erasure probability above the threshold 
(see Fig|6j. The real decoding process stops when hitting 
the n = axis. Think of a virtual process identical to the 
decoding for n > but that continues below the n = axis 
(for a proper definition, see [3]). A simple calculation shows 
that if the point at which the curve hits the x-axis varies by 
Ax while keeping the minimum at x*, it results in a variation 
of the height of the curve by 

d 2 ri(e, x) 



Ar-i 



dx 2 



(x — x*)Ax + o(x — x*) 



Taking the expectation of the square on both side and letting 
e tend to e*, we obtain the normalized variance of R\ at 
threshold 



5 ri - ri \, 




*\2\ 



(x-x*) 2 V + o((x~x*) 



lim(l-eX'(y)p'(x)) 2 V* 



e*X'(y*) J c^c 



The transition between the first and the second line comes the 
relationship between the e and x, with r\ (e, x) = when e 
tends to e*. 

The quantity V* differs from V computed in the previous 
paragraphs because of the different order of the limits n — ► oo 
and I — > oo. However it can be proved that the order does not 
matter and V = V*. Using the result J19I . we finally get 



7- 



J*X'(y*), 

We conclude that the scaling parameter a can be obtained as 
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V L '(l) (^) 2 V L'{l)x**X>{y*ypi{x*f 
The last expression is equal to the one in Lemma ^ 



IV. Message Variance 

Consider the ensemble LDPC(n, A, p) and transmission over 
the BEC of erasure probability e. As pointed out in the 
previous section, the scaling parameter a can be related to 
the (normalized) variance, with respect to the choice of the 
graph and the channel realization, of the number of erased 
edge messages sent from the variable nodes. Although what 
really matters is the limit of this quantity as the blocklength 
and the number of iterations tend to infinity (in this order) 
we start by providing an exact expression for finite number of 
iterations £ (at infinite blocklength). At the end of this section, 
we shall take the limit £ — > oo. 

To be definite, we initialize the iterative decoder by setting 
all check-to-variable messages to be erased at time 0. We let 
xt (respectively be the fraction of erased messages sent 
from variable to check nodes (from check to variable nodes), 
at iteration i, in the infinite blocklength limit. These values 
are determined by the density evolution [1] recursions yi+i = 
1 — p(xi), with xi = eX(yt) (where we used the notation 
Xi = 1 — Xi). The above initialization implies yo = 1, For 
future convenience we also set x j = = 1 for i < 0. 

Using these variables, we have the following characteriza- 
tion of V^, the (normalized) variance after I iterations. 

Lemma 3: Let G be chosen uniformly at random from 
LDPC(n, A,p) and consider transmission over the BEC of 
erasure probability e. Label the nL'(l) edges of G in some 
fixed order by the elements of {1, • • • , nX'(l)}. Assume that 
the receiver performs £ rounds of Belief Propagation decoding 

(£) 

and let p\ be equal to one if the message sent at the end of 
the £-th iteration along edge i (from a variable node to a check 
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node) is an erasure, and zero otherwise. Then 

vW s lim E[(E < /i|°) 2 ]-E[(S^°)] a 
?i^oo nL'(l) 

=x e + x t (l, 0) ("£ V(f) ■ • ■ C(£ - j)) (1, 0)' 



(20) 



edges in Ti 

e-i 

'(l)^A'(l)V(ir edges in T 2 



i=0 



edges in T 3 



+ (i.o)(X;(w-j^*(?.i) + (i - vt-i)u°U,j)) 

\/=o 

+ nt)---c(2e-j) 



3=1+1 



■{yt-jlT-ij^ + O—yt-^U ^))- 

edges in T4 

e 

J2 F t e\'( yi )(D(£ : l)p(Si_i) - D&ii-i) 



i=l 
£-1 



5^F i (x < + (l,0)V(^)...C(0)V(0)(l,0) T ) 

i=l 

■(l,0)V(i)-..C(i-^)(l,0) T 

£-1 

(a* + (1, 0)V(£) • • • C(0)V(0)(1, 0) T ) , 



i=l 



(A'(iy(i)r 



where we introduced the shorthand 

j-i 

V(i) ■ • • C(i - j) = JJ V(i - fc)C(i - k - 1). (21) 

fc=0 

We We define the matrices 

eX'( yi ) 



V(i) 
C(i) 

V(i) 
C(i) 



v(i)-eA'(») v(i) y 

p'(l) 
p'(^) 



A'(l) 

A'(l) 

p'ii) 

P '(i) 



(22) 
i > 0, (23) 

(24) 

' < 0. (25) 



Further, U*(j, j), U*(j, £), U°(j, j) and U°(j, £) are computed 
through the following recursion. For j < i, set 

U*(j,0) =( W _ i eA'(y < ), (1 - y£_,)eA'(y,)) T , 
C/°(i,0) =(0,0) T , 

whereas for j > £, initialize by 
U*(j,j-£) =(l,0)V(i)---C(2£-i)(l,0) 



T fe\'(y 2 e-3. 



+ (i, O )v0 . . . c(2^ - i)(0j i) t ( e(A '^| 1K ^ 2 ;f )} 

U°(j, j - 1) =(1, 0)V(i) • ■ • C(2£ - i)(0, 1) T ( 1 A ^J (1) ) ■ 

The recursion is 

k) =M 1 (j, k)C(£ - j + k- l)U*(j, k - 1) (26) 
+ M 2 (j,k)[N 1 (j,k-l)U*(j,k-l) 

+ N 2 { Jl k-l)U°{j 1 k-l)] 1 
U°(j, k) =V(l - j + fe)[iVx(i, k - 1)17*0*, * - 1) (27) 

+ JV 2 (j,fc-i)£/°(j,A ; -i)], 

with 

M 1 {j,k) = 

£ A'(j/ max {^_fc^_j + A;}) 

l{i<2fc}e(A'(w-fe) - \'{yi- j+k )) eA'(y £ _ fc ) 
M 2 (i,fc) = 

1 0>2fe}e(A'(w-j+fc) - \'{yt-k)) 

A'(l) - eA'(y min{ ,_ M _ 3+fc} ) A'(l) - eA'( W _ fc ) ^ ' 

Ni(j,k) = 

p'{l) - p'(5 £ _ fc _i) //(l) - p'(i ma x{f-t-l,H'+i}) 

l{j<2fc}(p'(S£-J + fc) - p'(^-fe-l)) 

N 2 (j,k) = 

(/(xt-k-l) l{j>2k}(p'(xi-k-l) - p'(xe-j+k)) 

The coefficients Fj are given by 

/■ 

F<= ^ ; e\\y k )f/{x k - x ), (28) 



and finally 



2/ 



fc=0 



W{£, a) =5^(1, 0)V(£) ■ • • C(€ - fc)i4(£, fc, a) 

£-1 



i=0 



with fc, a) equal to 



eayi- k X'(ay£-k) + eX(ay e ^ k ) 
aX'(a) + X(a) - eayt- k X' '{otyt-k) - eX(ayi- k ) J ' 

k < £ 

aX'(a) + X(a) 




k>£ 
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and 



2f 



D(i, a) = ^(1, 0)V(£) ■ • • C(£ - k + 1)V(£ -k + 1) 
fc=i 

a// (a) + - a(^_ fe )p'(ax£_ fe ) - p{ax^ k ) 
ax e _ k p'(ax e ^ k ) + p(ax e - k ) 
e-i 

■x e (api(a)+p(a))Y,P'(lYXW- 

i=Q 

Proof: Expand V^' in i20\ as 



V"' = lim HS^"' 2 



= lim 

n— >oc 



ni'(l) 



ni'(l) 

lim E[ M f^ M f]-ni'(l)^ 2 . 



(29) 



In the last step, we have used the fact that xg = ¥,[p^] for 
any i G {1, ■ • • , A'(l)}. Let us look more carefully at the first 
term of ( I29> . After a finite number of iterations, each message 
p\ depends upon the received symbols of a subset of the 
variable nodes. Since £ is kept finite, this subset remains finite 
in the large blocklength limit, and by standard arguments is a 
tree with high probability. As usual, we refer to the subgraph 
containing all such variable nodes, as well as the check nodes 
connecting them as to the computation tree for p\ '. 

It is useful to split the sum in the first term of Eq. d29t into 
two contributions: the first contribution stems from edges i so 
that the computation trees of p\ and p\ intersect, and the 
second one stems from the remaining edges. More precisely, 
we write 



lim f E[pJf ) V pf ] ] - nL'(l)xj ) = lim E[pf 



lim 

n — >oo 



nL'(l)xj 



(30) 



We define T to be that subset of the variable-to-check edge 

(t) 

indices so that if i e T then the computation trees p\ ' and 
p\ ' intersect. This means that T includes all the edges whose 
messages depend on some of the received values that are used 
in the computation of p\ ' . For convenience, we complete T 
by including all edges that are connected to the same variable 
nodes as edges that are already in T. T c is the complement in 
{1, • ■ • , nZ'(l)} of the set of indices T. 

The set of indices T depends on the number of iterations 
performed and on the graph realization. For any fixed I, T 
is a tree with high probability in the large blocklength limit, 
and admits a simple characterization. It contains two sets of 
edges: the ones 'above' and the ones 'below' edge 1 (we call 
this the 'root' edge and the variable node it is connected to, 
the 'root' variable node). Edges above the root are the ones 
departing from a variable node that can be reached by a non 
reversing path starting with the root edge and involving at most 




Fig. 7: Graph representing all edges contained in T, for the 
case of I = 2. The small letters represent messages sent 
along the edges from a variable node to a check node and 
the capital letters represent variable nodes. The message p\ 
is represented by (a). 



(0 



I variable nodes (not including the root one). Edges below the 
root are the ones departing from a variable node that can be 
reached by a non reversing path starting with the opposite 
of the root edge and involving at most 2£ variable nodes (not 
including the root one). Edges departing from the root variable 
node are considered below the root (apart from the root edge 
itself). 

We have depicted in Fig. an example for the case of an 
irregular graph with £ = 2. In the middle of the figure the edge 
(a) = 1 carries the message p^p . We will call p^-p the root 
message. We expand the graph starting from this root node. We 

consider £ variable node levels above the root. As an example, 

(t) 

notice that the channel output on node (A) affects p\ as well 
as the message sent on (b) at the £-th iteration. Therefore the 
corresponding computation trees intersect and, according to 
our definition (b) G T. On the other hand, the computation 
tree of (c) does not intersect the one of (a), but (c) G T 
because it shares a variable node with (6). We also expand 21 
levels below the root. For instance, the value received on node 
(B) affects both fjp and the message sent on (g) at the £-th 
iteration. 

We compute the two terms in d30i separately. 
Define S = lim^oo E iG T $ ^ and S ° = 

lim™ (E[/4 Eigt c - nL'(l)xf). 

1) Computation of S: Having defined T, we can further 
identify four different types of terms appearing in S and write 

S= limE[^5>f>] 

n — >-oo * * 

= lim E[^X>f>]+ lim E[ M « $>f >] + 
lim E[ M W £ p^} + Urn E[p^ £ ] 



i6T 3 



i£T 4 
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2f 



Mi 



Fig. 8: Size of T. It contains I layers of variable nodes above 
the root edge and 21 layer of variable nodes below the root 
variable node. The gray area represent the computation tree of 
the message p\ . It contains £ layers of variable nodes below 
the root variable node. 



The subset Ti C T contains the edges above the root variable 
node that carry messages that point upwards (we include the 
root edge in Ti). In Fig. Q the message sent on edge (b) 
is of this type. T2 contains all edges above the root but point 
downwards, such as (c) in Fig.0 T3 contains the edges below 
the root that carry an upward messages, like (d) and (/). 
Finally, T4 contains the edges below the root variable node 
that point downwards, like (e) and (<?). 

Let us start with the simplest term, involving the messages 
in T2. If i G T2, then the computation trees of p\ , and p\ 
are with high probability disjoint in the large blocklength limit. 
In this case, the messages fi\ and p\ do not depend on any 
common channel observation. The messages are nevertheless 
correlated: conditioned on the computation graph of the root 
edge the degree distribution of the computation graph of edge 
i is biased (assume that the computation graph of the root edge 
contains an unusual number of high degree check nodes; then 
the computation graph of edge i must contain in expectation 
an unusual low number of high degree check nodes). This 
correlation is however of order 0(1 /n) and since T only 
contains a finite number of edges the contribution of this 
correlation vanishes as n — > 00. We obtain therefore 

lim e[^5> w ] ^Mi)X>'(irA'(iy\ 

n. — >no ' ■ * — ^ 

i6T 2 



i=0 



where we used limn^oo Wjif^ ixf^} = x \, an d me f act mat me 
expected number of edges in T2 is p'(l) Xa=o ^'(l)V(l) 4 
For the edges in Ti we obtain 



lim E[n\ 



iSTi 



(31) 



x e (l,0) I ^>(W 
J =1 



with the matrices V(i) and C(i) defined in Eqs. ( 1221 and fl23l >. 
In order to understand this expression, consider the following 
case (cf. Fig. [9] for an illustration). We are at the i-th iteration 
of BP decoding and we pick an edge at random in the graph. 
It is connected to a check node of degree j with probability 
Pj. Assume further that the message carried by this edge 
from the variable node to the check node (incoming message) 
is erased with probability p and known with probability p. 
We want to compute the expected numbers of erased and 
known messages sent out by the check node on its other edges 
(outgoing messages). If the incoming message is erased, then 
the number of erased outgoing messages is exactly (j — 1). 
Averaging over the check node degrees gives us p'(l). If the 
incoming message is known, then the expected number of 
erased outgoing messages is (j— 1)(1 — (1— a^)- 7-2 ). Averaging 
over the check node degrees gives us p'(l) — p'(l — Xi). The 
expected number of erased outgoing messages is therefore, 
pp'(l) + p(p'(l) — p'(l — Xi)). Analogously, the expected 
number of known outgoing messages is pp'(xi). This result 
can be written using a matrix notation: the expected number 
of erased (respectively, known) outgoing messages is the first 
(respectively, second) component of the vector C(i)(p,p) T , 
with C(i) being defined in Eq. d23i . 

The situation is similar if we consider a variable node 
instead of the check node with the matrix the matrix V(i) 
replacing C(i). The result is generalized to several layers 
of check and variable nodes, by taking the product of the 
corresponding matrices, cf. Fig. |9] 



V(i + l)C(i)(p,p) 5 



c{i)(p,py j 




v 




(pip) 1 



(p,p) q 



(p,p? 



Fig. 9: Number of outgoing erased messages as a function of 
the probability of erasure of the incoming message. 

The contribution of the edges in Ti to S is obtained by 
writing 



n — >oo ' * 

= lim P{ M f = l}E£>f > \nt ) =l]. (32) 

i£Ti 

The conditional expectation on the right hand side is given by 
l + J2(l,0M£)---C(£-j)( 1 \ (33) 



where the 1 is due to the fact that E[/i^ | /Jf' = 1] = 1, and 
each summand (1,0)V(T) • • - C(£- j)(l,0) T , is the expected 
number of erased messages in the j-th layer of edges in Ti, 
conditioned on the fact that the root edge is erased at iteration 
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I (notice that = 1 implies /i] 1 ' = 1 for all i < £). Now 
multiplying J33i by P{a4 = 1} = X£ gives us OH . 

The computation is similar for the edges in T3 and results 

in 



lim dP] = X> °) y w • • ■ c ( £ - - 

n — >oc * — ' * — * 

iGT 3 j=l 

In this sum, when j > ^, we have to evaluate the matrices V(z) 
and C(i) for negative indices using the definitions given in J24i 
and d25l >. The meaning of this case is simple: if j > £ then the 
observations in these layers do not influence the message n\ . 
Therefore, for these steps we only need to count the expected 
number of edges. 

In order to obtain S, it remains to compute the contribution 
of the edges in T4. This case is slightly more involved than 
the previous ones. Recall that T4 includes all the edges that 
are below the root node and point downwards. In Fig.0 edges 
(e) and (g) are elements of T4. We claim that 



iim ntf E tf] 



=(1,0) £ (yi-jU'UJ) + (1 - yt-j)U°(j,3)) 



(34) 



3=0 



21 



+ (1,0) J2 VW---C(2£-i) 

j=t+i 

•{yi- i U*{j,e) + {l-y t - j )U°{j,£)). 

The first sum on the right hand side corresponds to messages 
Hi , i £ T4 whose computation tree contains the root variable 
node. In the case of Fig. where 1 = 1, the contribution of 
edge (e), would be counted in this first sum. The second term 
in ( I34l i corresponds to edges i G T4, that are separated from 
the root edge by more than I + 1 variable nodes. In Fig. 
edge (g) is of this type. 

In order to understand the first sum in d34i . consider the 
root edge and an edge i E T4 separated from the root edge 
by j + 1 variable node with j E {0, • • • ,£}. For this edge in 
T4, consider two messages it carries: the message that is sent 
from the variable node to the check node at the £-th iteration 
(this 'outgoing' message participates in our second moment 
calculation), and the one sent from the check node to the 
variable node at the {£— j)-th iteration ('incoming'). Define the 
two-components vector U*(j,j) as follows. Its first component 
is the joint probability that both the root and the outgoing 
messages are erased conditioned on the fact that the incoming 
message is erased, multiplied by the expected number of edges 
in T4 whose distance from the root is the same as for edge 
i. Its second component is the joint probability that the root 
message is erased and that the outgoing message is known, 
again conditioned on the incoming message being erased, and 
multiplied by the expected number of edges in T4 at the 
same distance from the root. The vector U°(j,j) is defined in 
exactly the same manner except that in this case we condition 
on the incoming message being known. The superscript * 
or accounts respectively for the cases where the incoming 
message is erased or known. 



From these definitions, it is clear that the contribution 
to S of the edges that are in T4 and separated from the 
root edge by j + 1 variable nodes with j E {0, • • • ,£}, is 
{1,0) (yt-jU*ti,j) + {l-Vl-i)U°(j,j)). We still have to 
evaluate U*(j,j) and U°(j,j). In order to do this, we define 
the vectors U*(j,k) and U°(j,k) with k < j, analogously 
to the case k — j, except that this time we consider the root 
edge and an edge in i E T4 separated from the root edge by 
fc+1 variable nodes. The outgoing message we consider is the 
one at the (£ — j + k)-th iteration and the incoming message 
we condition on, is the one at the (£ — fc)-th iteration. It is 
easy to check that U*(j,j) and U°(j,j) can be computed in 
a recursive manner using U*(j,k) and U°(j,k). The initial 
conditions are 



yt-3 eX '(ye) 



- ye-3) eX (ye)J V°/ 

and the recursion is for k E {1, • • • , j} is the one given in 
Lemma [3] cf. Eqs. i26\ and (I27> . Notice that any received 
value which is on the path between the root edge and the 
edge in T4 affects both the messages ii\ and fi\ on the 
corresponding edges. This is why this recursion is slightly 
more involved than the one for Ti. The situation is depicted 
in the left side of Fig. ^| 



root edge 



root 



edge '1 



edge in T41V 



edge in T 4 



II 



Fig. 10: The two situations that arise when computing the 
contribution of T4. In the left side we show the case where 
the two edges are separated by at most £ + 1 variable nodes 
and in the right side, the case where they are separated by 
more than £ + 1 variable nodes. 

Consider now the case of edges in T4 that are separated from 
the root edge by more than £+1 variable nodes, cf. right picture 
in Fig.^| In this case, not all of the received values along the 
path connecting the two edges, do affect both messages. We 
therefore have to adapt the previous recursion. We start from 
the root edge and compute the effect of the received values 
that only affect this message resulting in a expression similar 
to the one we used to compute the contribution of Ti. This 
gives us the following initial condition 

■C(2^-i)(l ) 0)^ eA ' ( ^- )N 

A'(l)(l-e) 

eA'(l) ^ 
-e)A'(l), 



U*(j,j-t)=(l,0)V(t) 

+ (l,0)vW---C(2^-j)(0,l) T | 
U°(j,j - £) =(1, 0)V(£) ■ ■ • C(2£ - j)(0, If ( 
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We then apply the recursion given in Lemma[3]to the intersec- 
tion of the computation trees. We have to stop the recursion 
at k = i (end of the intersection of the computation trees). It 
remains to account for the received values that only affect the 
messages on the edge in T4. This is done by writing 



(1,0) 



2/ 

E 

3 =l+l 



V(£)---C(2£-j) 



■(y^U*{j,i) + {l-y^)U\j,£)), 

which is the second term on the right hand side of Eq. (I34> . 

2) Computation of S c : We still need to compute S c = 
lim^oo £ ieT < /4* } ] -nL'(l)xfj. Recall that by 

definition, all the messages that are carried by edges in T c 
at the £-th iteration are functions of a set of received values 
distinct from the ones 11 depends on. At first sight, one might 

(£) 

think that such messages are independent from p\ . This is 
indeed the case when the Tanner graph is regular, i.e. for the 



degree distributions X(x) = x 1 1 and p(x) = x 
have 

S c = lim (E[^X>jV^'(l)*?l 



We then 



lim (|T 



A'(l)x 



lim ((\'(l)-\T\)xj-A'(l)xj) 



Tt 



with the cai-dinality of T being |T| = YhLq i 1 - 1 )'( r _ l Y 1 

+eL (i-ir^r-iri. 

Consider now an irregular ensemble and let Gj be the graph 
composed by the edges in T and by the variable and check 
nodes connecting them. Unlike in the regular case, Gt is 
not fixed anymore and depends (in its size as well as in its 
structure) on the graph realization. It is clear that the root 
message p\ depends on the realization of G T . We will see 
that the messages carried by the edges in T c also depend 
on the realization of G T . On the other hand they are clearly 

conditionally independent given G T (because, conditioned on 

(i) 

G T , p\ is just a deterministic function of the received symbols 
in its computation tree). If we let j denote a generic edge in T c 
(for instance, the one with the lowest index), we can therefore 
write 

S c = lim ( E[/i^ Y fifh - nL'{l)xY\ 

= lim (EoM^ E 4° I g t]] - nL'{l)x} j 

= lim (E GT [|T c |E[/iW j G T ]E[/if | G T ]] - nL'(l)xj) 

= lim (E GT [(nL'(l) - |T|)E[^> | G T ]E[/if | G T ]] 

- nL'{l)xj 



lim nL'(l) E^pE^ | G T ]E[/r | G T ]] - nL' (X)x\ 
- lim EgJItIE^' I G T ]E[/if | G T ]]. 



(35) 



We need to compute E[/^' | G T ] for a fixed realization of 
Gt and an arbitrary edge j taken from T c (the expectation 
does not depend on j G T c : we can therefore consider it as a 
random edge as well). This value differs slightly from xe for 
two reasons. The first one is that we are dealing with a fixed- 
size Tanner graph (although taking later the limit n — > 00) and 
therefore the degrees of the nodes in G T are correlated with 
the degrees of nodes in its complement G\G T . Intuitively, if G T 
contains an unusually large number of high degree variable 
nodes, the rest of the graph will contain an unusually small 
number of high degree variable nodes affecting the average 
E[^' I G T ]. The second reason why E[^' | G T ] differs from 
xi, is that certain messages carried by edges in T c which are 
close to G T are affected by messages that flow out of G T . 

The first effect can be characterized by computing the 
degree distribution on G\G T as a function of Gt- Define 
Vi(G T ) (respectively C;(G T )) to be the number of variable 
nodes (check nodes) of degree i in G T , and let V(x; G T ) = 
£ !; ^(G T ).T l and C(:z;;Gt) = Ei^C^y : . We shall also 
need the derivatives of these polynomials: V'(x\Gj) = 
£. iV l {G 1 )x l ~ 1 and C'(x; G T ) = E 4 iC^x^ 1 . It is easy to 
check that if we take a bipartite graph having a variable degree 
distributions X(x) and remove a variable node of degree i, the 
variable degree distribution changes by 

i\(x) 



SiX(x) 



ix 



+ 0(l/n z ). 



nL'(l) 

Therefore, if we remove Gt from the bipartite graph, the 
remaining graph will have a variable perspective degree dis- 
tribution that differ from the original by 

7'(1;Gt)A(x)-V'(x;Gt) 



8X{x) 



+ 0(l/n z ). 



nL'{\) 

In the same way, the check degree distribution when we 
remove G T changes by 

C'(l;G T )p(x)-C'(x;G T ) 



Sp(x) 



0{l/n z 



nL'(l) 

If the degree distributions change by SX(x) and 5p(x), the 
fraction xg of erased variable-to-check messages changes by 
8x£. To the linear order we get 

l e 

<^=E . . e ^'(yk)p'(x k -i) [eSX{yi) - eX'{yi)8p(xi-i)] , 

i=l k=i+l 

-eX'( yi )(C'(l; Gr)p(xi_i) - C"(^_ i; G T ))] + 0(l/n 2 ), 

with Fi defined as in Eq. (|28}. 

Imagine now that we ix the degree distribution of G\G T . 
The conditional expectation E[/^ | G T ] still depends on the 
detailed structure of G T . The reason is that the messages that 
flow out of the boundary of G T (both their number and value) 
depend on Gt, and these message affect messages in G\Gt. 
Since the fraction of such (boundary) messages is 0(l/n), 
their effect can be evaluated again perturbatively. 

Call B the number of edges forming the boundary of G T 
(edges emanating upwards from the variable nodes that are £ 
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levels above the root edge and emanating downwards from the 
variable nodes that are 21 levels below the root variable node) 
and let B* be the number of erased messages carried at the 
i-th iteration by these edges. Let Xi be the fraction of erased 
messages, incoming to check nodes in G\Gt from variable 
nodes in G\G T , at the i-th iteration. Taking into account the 
messages coming from variable nodes in G T (i.e. corresponding 
to boundary edge), the overall fraction will be Xi + Sxi, where 

^ = W +0<1/ " 2) - 

This expression simply comes from the fact that at the i- 
th iteration, we have (nA'(l) - T) = nA'(l)(l + 0(1 /n)) 
messages in the complement of G T of which a fraction Xi is 
erased. Further B messages incoming from the boundaries of 
which B* are erasures. 

Combining the two above effects, we have for an edge j € 

1 1 

E^f | Gt] = xt + J2 F < g t) - eV%; G T ) 



- eA' ( W ) (C (1; Gr)p(Si-i ) - C ; G T ))] 
1 



i-i 



Fi(B$-Bxi) + 0(l/n'). 



A'(l) ^ 

K ' i=i 

We can now use this expression ( I35> to obtain 

S c = lim A'(l) (Eg^E^ I G T ]E{pf ] | G T ]] - nL 1 (\)xf) 



- lim E Gt [|T|E[/4 £) | G T ]E[ M f I G T ]] 

i 

= J2Fi (xM^V'il; Gt)] - dE[/4V(2/ i; G T )] 



i=i 



^ F 4 eA'( W ) (E^C'Cl; GT)]p(si_i) 

i=l 

-E[ Mi V (£,-_!; G T )] 



E 



JiE^B?] 



£-1 

E 



F^Ef/^-^E^V^l)], 



where we took the limit n — > oo and replaced |T| by V (1; Gt). 

It is clear what each of these values represent. For example, 
E[/4 V'(l; Gt)] is the expectation of times the number 
of edges that are in G T . Each of these terms can be computed 
through recursions that are similar in spirit to the ones used 
to compute S. These recursions are provided in the body of 
Lemma|5] We will just explain in further detail how the terms 
E^B} and E^'B*} are computed. 

We claim that 

E[^B] = (x t + (1, 0)V(*) ■ • • C(0)V(0)(1, 0) T ) (X'(l) P '(l)) e . 

The reason is that fi\ ' depends only on the realization of its 
computation tree and not on the whole Gt- From the defini- 
tions of G T , the boundary of G T is in average (A'(l)p'(l)) 
larger than the boundary of the computation tree. Finally, 
the expectation of fi\ times the number of edges in the 
boundary of its computation tree is computed analogously to 



what has been done for the contribution of S. The result is 
(xt + (1,0)V(£) • ■ -C(0)V(0)(1,0) T ) (the term x t accounts for 
the root edge, and the other one of the lower boundary of 
the computation tree). Multiplying this by (A'(l)p'(l)) £ , we 
obtain the above expression. 

The calculation of E[/x^2?*] is similar. We start by comput- 
ing the expectation of n\' multiplied by the number of edges 
in the boundary of its computation tree. This number has to be 
multiplied by (1, 0)V(i)C(« - 1) ■ • • V(i - 1 + l)C(i - t)(l, 0) T 
to account for what happens between the boundary of the 
computation tree and the boundary of G T . We therefore obtain 

E[^B*] = (x e + (1, 0)V(*) • • ■ C(0)V(0)(1, Of) 
■(l,0)V(i)..-C(i-^)(l,0) T . 

■ 

The expression provided in the above lemma has been used 
to plot V^' for e G (0, 1) and for several values of I in the 
case of an irregular ensemble in Fig. [5] 

It remains to determine the asymptotic behavior of this 
quantity as the number of iterations converges to infinity. 

Lemma 4: Let G be chosen uniformly at random from 
LDPC(n, X, p) and consider transmission over the BEC of 
erasure probability e. Label the nL'(l) edges of G in some 
fixed order by the elements of {1, ... , nL'(l)}. Set [if' equal 
to one if the message along edge i from variable to check node, 
after I iterations, is an erasure and equal to zero otherwise. 
Then 



E[(E^ w ) 2 ]-E[(E ^n] 

nL'(l) 



WM2 



lim lim 

l— too n^oo 

e 2 \i(y) 2 (p(x)* - p(x 2 ) + p'(x)(l - 2xp(x)) - x 2 p>(x 2 )) 

(l-e\'{y)p'(x)y 

e 2 A'(y)V(x) 2 (e 2 A(y) 2 - 6 2 Afa 2 ) - y\ 2 \'(y 2 )) 

(l-eA'(y)p'(i)) 2 

(x - e 2 A(y 2 ) - y 2 e 2 \'(y 2 ))(l + e\'{y)p'(x)) + ey 2 \'(y) 

l-eX'(y)p'(x) 
The proof is a (particularly tedious) calculus exercise, and 

we omit it here for the sake of space. 
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